Improved biclustering of microarray data demonstrated through systematic performance tests

نویسندگان

  • Heather L. Turner
  • Trevor C. Bailey
  • Wojtek J. Krzanowski
چکیده

A new algorithm is presented for 4tting the plaid model, a biclustering method developed for clustering gene expression data. The approach is based on speedy individual di6erences clustering and uses binary least squares to update the cluster membership parameters, making use of the binary constraints on these parameters and simplifying the other parameter updates. The performance of both algorithms is tested on simulated data sets designed to imitate (normalised) gene expression data, covering a range of biclustering con4gurations. Empirical distributions for the components of these data sets, including non-systematic error, are derived from a real set of microarray data. A set of two-way quality measures is proposed, based on one-way measures commonly used in information retrieval, to evaluate the quality of a retrieved bicluster with respect to a target bicluster in terms of both genes and samples. By de4ning a one-to-one correspondence between target biclusters and retrieved biclusters, the performance of each algorithm can be assessed. The results show that, using appropriately selected starting criteria, the proposed algorithm out-performs the original plaid model algorithm across a range of data sets. Furthermore, through the rigorous assessment of the plaid model a benchmark for future evaluation of biclustering methods is established. c © 2004 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query-based Biclustering using Formal Concept Analysis

Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate prior knowledge and expertise from the user. To this end, query-based biclustering algorithms that are rece...

متن کامل

Biclustering with Background Knowledge using Formal Concept Analysis

Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate background knowledge and expertise from the user. To this end, query-based biclustering algorithms have bee...

متن کامل

A memetic algorithm for discovering negative correlation biclusters of DNA microarray data

Most biclustering algorithms for microarrays data analysis focus on positive correlations of genes. However, recent studies demonstrate that groups of biologically significant genes can show negative correlations as well. So, discovering negatively correlated patterns from microarrays data represents a real need. In this paper, we propose a Memetic Biclustering Algorithm (MBA) which is able to ...

متن کامل

Methods to Bicluster Validation and Comparison in Microarray Data

There are lots of validation indexes and techniques to study clustering results. Biclustering algorithms have been applied in Systems Biology, principally in DNA Microarray analysis, for the last years, with great success. Nowadays, there is a big set of biclustering algorithms each one based in different concepts, but there are few intercomparisons that measure their performance. We review and...

متن کامل

به کارگیری خوشه‌بندی دوبعدی با روش «زیرماتریس‌های با میانگین- درایه‌های بزرگ» در داده‌های بیان ژنی حاصل از ریزآرایه‌های DNA

Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2005